Asymptotic Optimality for Decentralised Bandits
نویسندگان
چکیده
Abstract We consider a large number of agents collaborating on multi-armed bandit problem with arms. The goal is to minimise the regret each agent in communication-constrained setting. present decentralised algorithm which builds upon and improves Gossip-Insert-Eliminate method Chawla et al. (International conference artificial intelligence statistics, pp 3471–3481, 2020). provide theoretical analysis incurred shows that our asymptotically optimal. In fact, guarantee matches optimal rate achievable full communication Finally, we empirical results support conclusions.
منابع مشابه
DCOPs and bandits: exploration and exploitation in decentralised coordination
Real life coordination problems are characterised by stochasticity and a lack of a priori knowledge about the interactions between agents. However, decentralised constraint optimisation problems (DCOPs), a widely adopted framework for modelling decentralised coordination tasks, assumes perfect knowledge of these factors, thus limiting its practical applicability. To address this shortcoming, we...
متن کاملOptimality of Thompson Sampling for Gaussian Bandits Depends on Priors
In stochastic bandit problems, a Bayesian policy called Thompson sampling (TS) has recently attracted much attention for its excellent empirical performance. However, the theoretical analysis of this policy is difficult and its asymptotic optimality is only proved for one-parameter models. In this paper we discuss the optimality of TS for the model of normal distributions with unknown means and...
متن کاملAsymptotic Optimality of Balanced Routing
Consider a system with K parallel single-servers, each with its own waiting room. Upon arrival, a job is to be routed to the queue of one of the servers. Finding routing policy that minimizes the total workload in the system is a known difficult problem in general. Even if the optimal policy is identified, the policy would require the full queue length information at the arrival of each job; fo...
متن کاملNormal Bandits of Unknown Means and Variances: Asymptotic Optimality, Finite Horizon Regret Bounds, and a Solution to an Open Problem
Consider the problem of sampling sequentially from a finite number of N > 2 populations, specified by random variables X i k, i = 1, . . . ,N, and k = 1,2, . . .; where X i k denotes the outcome from population i the k th time it is sampled. It is assumed that for each fixed i, {X i k}k>1 is a sequence of i.i.d. normal random variables, with unknown mean μi and unknown variance σ2 i . The objec...
متن کاملActive Search and Bandits on Graphs using Sigma-Optimality
Many modern information access problems involve highly complex patterns that cannot be handled by traditional keyword based search. Active Search is an emerging paradigm that helps users quickly find relevant information by efficiently collecting and learning from user feedback. We consider active search on graphs, where the nodes represent the set of instances users want to search over and the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Dynamic Games and Applications
سال: 2022
ISSN: ['2153-0793', '2153-0785']
DOI: https://doi.org/10.1007/s13235-022-00451-1